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Abstract 

Understanding information exchange and aggregation on networks is a central problem in 
theoretical economics, probability and statistics. We study a standard model of economic agents 
on the nodes of a social network graph who learn a binary "state of the world" S, from initial 
signals, by repeatedly observing each other's best guesses. 

Asymptotic learning is said to occur on a family of graphs G n = (V n , E n ) with \V n \ — > oo if 
with probability tending to 1 as n — > oo all agents in G n eventually estimate S correctly. We 
identify sufficient conditions for asymptotic learning and contruct examples where learning does 
not occur when the conditions do not hold. 

1 Introduction 

We consider a directed graph G representing a social network. The nodes of the graph are the set 
of agents V, and an edge from agent u to w indicates that u can observe the actions of w. The 
agents try to estimate a binary state of the world S G {0, 1}, where each of the two possible states 
occurs with probability one half. 

The agents are initially provided with private signals which are informative with respect to S 
and i.i.d., conditioned on S: There are two distributions, /Urj ^ /xi, such that conditioned on S, the 
private signals are independent and distributed jig- 

In each time period t £ N, each agent v chooses an "action" A v (t), which equals whichever of 
{0, 1} the state of the world is more likely to equal, conditioned on the information available to v 
at time t. This information includes its private signal, as well as the actions of its social network 
neighbors in the previous periods. 

A first natural question is whether the agents eventually reach consensus, or whether it is 
possible that neighbors "agree to disagree" and converge to different actions. Assuming that the 
agents do reach consensus regarding their estimate of S, a second natural question is whether this 
consensus estimator is equal to S. Certainly, since private signals are independent conditioned 
on S, a large enough group of agents has, in the aggregation of their private signals, enough 
information to learn S with high probability. However, it may be the case that this information is 
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not disseminated by the above described process. These and related questions have been studied 
extensively in economics, statistics and operations research; see Section 

We say that the agents learn on a social network graph G when all their actions converge to 
the state of the world S. For a sequence of graphs {G n }^ =1 such that G n has n agents, we say 
that Asymptotic learning occurs when the probability that the agents learn on G n tends to one as 
n tends to infinity, for a fixed choice of private signal distributions \i\ and [Iq. 

An agent's initial private belief is the probability that 5 = 1, conditioned only on its private 
signal. When the distribution of private beliefs is atomic, asymptotic learning does not necessarily 
occur (see Example A.l). This is also the case when the social network graph is undirected (see 
Example 2.7). Our main result (Theorem [3]) is that asymptotic learning occurs for non-atomic 
private beliefs and undirected graphs. 

To prove this theorem we first prove that the condition of non-atomic initial private beliefs 
implies that the agents all converge to the same action, or all don't converge at all (Theorem 1 
We then show that for any model in which this holds, asymptotic learning occurs (Theorem 2 
Note that it has been shown that agents reach agreement under various other conditions (cf. 
Menager [12]). Hence, by Theorem [2j asymptotic learning also holds for these models. 

Our proof includes several novel insights into the dynamics of interacting Bayesian agents. 
Broadly, we show that on undirected social network graphs connecting a countably infinite number 
of agents, if all agents converge to the same action then they converge to the correct action. This 
follows from the observation that if agents in distant parts of a large graph converge to the same 
action then they do so almost independently. We then show that this implies that for finite graphs 
of growing size the probability of learning approaches one. 

At its heart of this proof lies a topological lemma (Lemma 3.13) which may be of independent 
interest; the topology here is one of rooted graphs (see, e.g., Benjamini and Schramm [5], Aldous 
and Steele The fact that asymptotic learning occurs for undirected graphs (as opposed to 

general strongly connected graphs) is related to the fact that sets of bounded degree, undirected 
graphs are compact in this topology. In fact, our proof applies equally to any such compact sets. For 
example, one can replace undirected with L-locally strongly connected: a directed graph G = (V, E) 
is L-locally strongly connected if, for each [u, w) G E, there exists a path in G of length at most 
L from w to u. Asymptotic learning also takes place on L-locally strongly connected graphs, for 
fixed L, since sets of L-locally strongly connected, uniformly bounded degree graphs are compact. 
See Section 13.71 for further discussion. 



1.1 Related literature 
1.1.1 Agreement 

There is a vast economic literature studying the question of convergence to consensus in dynamic 
processes and games. A founding work is Aumann's seminal Agreement Theorem [2j, which states 
that Bayesian agents who observe beliefs (i.e., posterior probabilities, as opposed to actions in our 
model) cannot "agree to disagree". Subsequent work (notably Geanakoplos and Polemarchakis [10J, 
Parikh and Krasucki p3] , McKelvey and Page [11] , Gale and Kariv [9] Menager [12] and Rosenberg, 
Solan and Vieille [15] ) expanded the range of models that display convergence to consensus. One is, 
in fact, left with the impression that it takes a pathological model to feature interacting Bayesian 
agents who do "agree to disagree" . 

Menager |12j in particular describes a model similar to ours and proves that consensus is achieved 
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in a social network setting under the condition that the probability space is finite and ties cannot 
occur (i.e., posterior beliefs are always different than one half). Note that our asymptotic learning 
result applies for any model where consensus is guaranteed, and hence in particular applies to 
models satisfying Manager's conditions. 

1.1.2 Agents on social networks 

Gale and Kariv [9] also consider Bayesian agents who observe each other's actions. They introduce 
a model in which, as in ours, agents receive a single initial private signal, and the action space is 
discrete. However, there is no "state of the world" or conditionally i.i.d. private signals. Instead, 
the relative merit of each possible action depends on all the private signals. Our model is in fact 
a particular case of their model, where we restrict our attention to the particular structure of the 
private signals described above. 

Gale and Kariv show (loosely speaking) that neighboring agents who converge to two different 
actions must, at the limit, be indifferent with respect to the choice between these two actions. 
Their result is therefore also an agreement result, and makes no statement on the optimality of 
the chosen actions, although they do profess interest in the question of "... whether the common 
action chosen asymptotically is optimal, in the sense that the same action would be chosen if all the 
signals were public information... there is no reason why this should be the case." This is precisely 
the question we address. 

A different line of work is the one explored by Ellison and Fudenberg [8]. They study agents 
on a social network that use rules of thumb rather than full Bayesian updates. A similar approach 
is taken by Bala and Goyal j3], who also study agents acting iteratively on a social network. They 
too are interested in asymptotic learning (or "complete learning", in their terms). They consider 
a model of bounded rationality which is not completely Bayesian. One of their main reasons for 
doing so is the mathematical complexity of the fully Bayesian model, or as they state, "to keep 
the model mathematically tractable... this possibility [fully Bayesian agents] is precluded in our 
model... simplifying the belief revision process considerably." In this simpler, non-Bayesian model, 
Bala and Goyal show both behaviors of asymptotic learning and results of non-learning, depending 
on various parameters of their model. 

1.1.3 Herd behavior 

The "herd behavior" literature (cf. Banerjee [3], Bikhchandani, Hirshleifer and Welch [6J, Smith 
and S0rensen [16]) consider related but fundamentally simpler models. As in our model there is a 
"state of the world" and conditionally independent private signals. A countably infinite group of 
agents is exogenously ordered, and each picks an action sequentially, after observing the actions of 
its predecessors or some of its predecessors. Agents here act only once, as opposed to our model in 
which they act repeatedly. 

The main result for these models is that in some situations there may arise an "information 
cascade", where, with positive probability, almost all the agents take the wrong action. This is 
precisely the opposite of asymptotic learning. The condition for information cascades is "bounded 
private beliefs"; herd behavior occurs when the agents' beliefs, as inspired by their private signals, 
are bounded away both from zero and from one [TB]. In contrast, we show that in our model 
asymptotic learning occurs even for bounded beliefs. 
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In the herd behavior models information only flows in one direction: If agent u learns from w 
then w does not learn from u. This significant difference, among others, makes the tools used for 
their analysis irrelevant for our purposes. 



2 Formal definitions, results and examples 
2.1 Main definitions 

The following definition of the agents, the state of the world and the private signals is adapted 
from [13J . where a similar model is discussed. 

Definition 2.1. Let (£1,0) be a a-algebra. Let hq and fi\ be different and mutually absolutely 
continuous probability measures on (O, O). 

Let 5q and 5i be the distributions on {0, 1} such that 5o(0) = S±(l) = 1. 

Let V be a countable (finite or infinite) set of agents, and let 

P = \5 $ + y if iY, 

be a distribution over {0, 1} x Q v . We denote by S £ {0, 1} the state of the world and by W u 
the private signal of agent u £ V. Let 

{S, W U1 ,W U2 , . . .) ~ P. 

Note that the private signals W u are i.i.d., conditioned on S: if S = - which happens with 
probability half - the private signals are distributed i.i.d. fj,Q, and if S = 1 then they are distributed 
i.i.d. fjL\. 

We now define the dynamics of the model. 

Definition 2.2. Consider a set of agents V , a state of the world S and private signals {W u : u £ V} 
such that 

(S,W U1 ,W U2 ,...)~F, 



as defined in Definition 2.1 



Let G = (V, E) be a directed graph which we shall call the social network. We assume 
throughout that G is simple (i.e., no parallel edges or loops) and strongly connected. Let the set of 
neighbors of u be N{u) = {v : (u,v) £ E}. The out-degree of u is equal to \N(u)\. 

For each time period t G {1, 2, . . .} and agent u E V, denote the action of agent u at time t by 
A u (t), and denote by the information available to agent u at time t. They are jointly defined 

by 

T u (t) = a(W u ,{A v (t') : v £ N(u),t' < i}), 

and 

'0 F[S = l\T u (t)]<l/2 

A u {t)=ll F[S = l\T u (t)]>l/2 

£{0,1} F[S = l\F u (t)] = 1/2. 

Let X u (t) = P [S = l\J- u (t)] be agent u's belief at time t. 
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Informally stated, A u (t) is agent u's best estimate of S given the information F u {t) available 
to it up to time t. The information available to it is its private signal W u and the actions of its 
neighbors in G in the previous time periods. 

Remark 2.3. An alternative and equivalent definition of A u {t) is the MAP estimator of S, as 
calculated by agent u at time t: 

A u (t) = argmaxP [S = s\T(t)] = argmaxP [A = S] , 
se{o,i} AeF(t) 

with some tie-breaking rule. 

Note that we assume nothing about how agents break ties, i.e., how they choose their action 
when, conditioned on their available information, there is equal probability for S to equal either 
or 1. 

Note also that the belief of agent u at time t = 1, X u (l), depends only on W u : 

X U (1)=F[S = 1\W U }. 
We call X u (l) the initial belief of agent u. 

Definition 2.4. Let fiQ and fii be such that X u (l), the initial belief of u, has a non-atomic distri- 
bution (■&■ the distributions of the initial beliefs of all agents are non-atomic). Then we say that 
the pair (//o,Mi) induce non-atomic beliefs. 

We next define some limiting random variables: T u is the limiting information available to u, 
and X u is its limiting belief. 

Definition 2.5. Denote T u = Ut.F u (t), and let 

X U = F[S=1\T U ]. 

Note that the limit lim^oo X u {t) almost surely exists and equals X u , since X u {t) is a bounded 
martingale. 

We would like to define the limiting action of agent u. However, it might be the case that 
agent u takes both actions infinitely often, or that otherwise, at the limit, both actions are equally 
desirable. We therefore define A u to be the limiting optimal action set. It can take the values {0}, 
{1} or {0,1}. 

Definition 2.6. Let A u , the optimal action set of agent u, be defined by 



A u 



'{0} X u < 1/2 
{1} X u > 1/2 
,{0,1} X u = l/2. 



Note that if a is an action that u takes infinitely often then a G A u , but that if (say) is the 
only action that u takes infinitely often then it still may be the case that A u = {0, 1}. However, 
we show below that when (//q, /-ti) induce non-atomic beliefs then A u is almost surely equal to the 
set of actions that u takes infinitely often. 
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2.2 Main results 



In our first theorem we show that when initial private beliefs are non-atomic, then at the limit 



t — > oo the optimal action sets of the players are identical. As Example A.l indicates, this may not 
hold when private beliefs are atomic. 

Theorem 1. Let (no,fii) induce non-atomic beliefs. Then there exists a random variable A such 
that almost surely A u = A for all u. 

I.e., when initial private beliefs are non-atomic then agents, at the limit, agree on the optimal 
action. The following theorem states that when such agreement is guaranteed then the agents learn 
the state of the world with high probability, when the number of agents is large. This phenomenon 
is known as asymptotic learning. This theorem is our main result. 

Theorem 2. Let Ho^i be such that for every connected, undirected graph G there exists a random 
variable A such that almost surely A u = A for all u £ V . Then there exists a sequence q(n) = 
q(n, no, jii) such that q(n) — > 1 as n — > oo, and P [A = {S}] > q{n), for any choice of undirected, 
connected graph G with n agents. 

Informally, when agents agree on optimal action sets then they necessarily learn the correct 
state of the world, with probability that approaches one as the number of agents grows. This holds 
uniformly over all possible connected and undirected social network graphs. 

The following theorem is a direct consequence of the two theorems above, since the property 
proved by Theorem [T] is the condition required by Theorem [2] 

Theorem 3. Let fio and jjL\ induce non-atomic beliefs. Then there exists a sequence q(n) = 
q(n, hq. Hi) such that q{n) — > 1 as n — > oo, and P [A u = {S}] > q(n), for all agents u and for 
any choice of undirected, connected G with n agents. 



2.3 Note on directed vs. undirected graphs 

Note that we require that the graph G not only be strongly connected, but also undirected (so 
that if (u,v) £ E then (v,u) € E.) The following example (depicted in Figure [TJ shows that when 
private beliefs are bounded then asymptotic learning may not occur when the graph is strongly 
connected but not undirected^] 

Example 2.7. Consider the the following graph. The vertex set is comprised of two groups of 
agents: a "royal family" clique of 5 agents who all observe each other, and 5 — n agents - the 
"public" - who are connected in a chain, and in addition can all observe all the agents in the royal 
family. Finally, a single member of the royal family observes one of the public, so that the graph is 
strongly connected. 

Now, with positive probability, which is independent of n, there occurs the event that all the 
members of the royal family initially take the wrong action. Assuming the private signals are 
sufficiently weak, then it is clear that all the agents of the public will adopt the wrong opinion of 
the royal family and will henceforth choose the wrong action. 

Note that the removal of one edge - the one from the royal back to the commoners - results 
in this graph no longer being strongly connected. However, the information added by this edge 

1 We draw on Bala and Goyal's [3] royal family graph. 
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Figure 1: The five members of the royal family (on the right) all observe each other. The rest 
of the agents - the public - all observe the royal family (as suggested by the three thick arrows in 
the middle) and their immediate neighbors. Finally, one of the royals observes one of the public, 
so that the graph is strongly connected. This is an example of how asymptotic learning does not 
necessarily occur when the graph is undirected. 



rarely has an affect on the final outcome of the process. This indicates that strong connectedness 
is too weak a notion of connectedness in this context. We therefore in seek stronger notions such 
as connectedness in undirected graphs. 

A weaker notion of connectedness is that of L-locally strongly connected graphs, which we 
defined above. For any L, the graph from Example 2.7 is not L-locally strongly connected for n 
large enough. 



3 Proofs 

Before delving into the proofs of Theorems [T] and [2] we introduce additional definitions in subsec- 



tion 3.1 and prove some general lemmas in subsections |3.2| |3.3| and 3.4 Note that Lemma 3.13 



which is the main technical insight in the proof of Theorem [2j may be of independent interest. We 
prove Theorem [2] in subsection 3.5 and Theorem [T] in subsection 3.6 
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3.1 Additional general notation 

Definition 3.1. We denote the log-likelihood ratio of agent u's belief at time t by 

X u {t) 



Z u (t) = log 



1 - X u (t) ' 



and let 



Note that 



Z u = lim Z u {t). 



P[S = 0|Jf„(t)]' 



and that 



Z u (l) = log^(W u ). 

Note also that Z u (t) converges almost surely since X u {t) does. 
Definition 3.2. We denote the set of actions of agent u up to time t by 

A u (t) = (A u (l),...,A u (t-l)). 
The set of all actions of u is similarly denoted by 

A U = (A U (1),A U (2),...). 
We denote the actions of the neighbors of u up to time t by 

I u (t) = {A w (t) : w e N(u)} = {A w (t') : w G N(u),t' < t}, 
and let I u denote all the actions of u 's neighbors: 

I U = {A W : we N{u)} = {A w (t') : w G N(u),t' > 1}. 
Note that using this notation we have that F^t) = a(W u ,I u (t)) and T u = a(W u ,I u ). 
Definition 3.3. We denote the probability that u chooses the correct action at time t by 

Pu (t)=F[A u (t) = S}. 

and accordingly 

p u = lim p u (t). 

t— >oo 

Definition 3.4. For a set of vertices U we denote by W(U) the private signals of the agents in U . 
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3.2 Sequences of rooted graphs and their limits 

In this section we define a topology on rooted graphs. We call convergence in this topology con- 
vergence to local limits, and use it repeatedly in the proof of Theorem [2] The core of the proof of 
Theorem [2] is the topological Lemma |3. 13 which we prove here. This lemma is a claim related to 



local graph properties, which we also introduce here. 

Definition 3.5. Let G = (V,E) be a finite or countably infinite graph, and let u G V be a vertex 
in G. We denote by {G, u) the rooted graph G with root u. 

Definition 3.6. Let G = (V, E) and G' = (V , E') be graphs, h : V — )• V is a graph isomorphism 

between G and G' if (u,v) G E O (h(u), h(v)) G E' . 

Let (G,u) and (G',u') be rooted graphs. Then h : V — > V is a rooted graph isomorphism 

between (G,u) and (G',u') if h is a graph isomorphism and h(u) = u' . 

We write (G,u) = {G',u') whenever there exists a rooted graph isomorphism between the two 
rooted graphs. 

Given a (perhaps directed) graph G = (V, E) and two vertices u, w G V, the graph distance 
d(u,w) is equal to the length in edges of a shortest (directed) path between u and w. 

Definition 3.7. We denote by B r {G,u) the ball of radius r around the vertex u in the graph 
G = (V, E): Let V be the set of vertices w such that d{u, w) is at most r. Let E' = {{u, w) G E : 
u,w G V'}. Then B r {G,u) is the rooted graph with vertices V' , edges E' and root u' . 

We next define a topology on strongly connected rooted graphs (or rather on their isomor- 
phism classes; we shall simply refer to these classes as graphs). A natural metric between strongly 
connected rooted graphs is the following (see Benjamini and Schramm [5], Aldous and Steele PQ). 
Given (G,u) and {G',u'), let 

D{{G,u),{G',u')) = 2- R , 

where 

R = sup{r : B r (G, u) ^ B r (G',u')}. 

This is indeed a metric: the triangle inequality follows immediately, and a standard diagonalization 
argument is needed to show that if D{{G, u), {G' , u')) = then (G, u) = (G' , u'). 

This metric induces a topology that will be useful to us. As usual, the basis of this topology is 
the set of balls of the metric; the ball of radius 2~ R around the graph (G, u) is the set of graphs 
(G',u') such that Br(G,u) = Br(G' ,u'). We refer to convergence in this topology as convergence 
to a local limit, and provide the following equivalent definition for it: 

Definition 3.8. Let {(G r , u r )}^L 1 be a sequence of strongly connected rooted graphs. We say that 
the sequence converges if there exists a strongly connected rooted graph (G',u') such that 

B r (G',u) ^ B r (G r ,u r ), 

for all r > 1. We then write 

(G',u')= lim (G r ,u r ), 

r— >oc 

and call (G',u') the local limit of the sequence {(G r ,u r )}^ =1 . 
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Let Qd be the set of strongly connected rooted graphs with degree at most d. Another standard 
diagonalization argument shows that is compact (see again [5l |Tj). Then, since the space is 
metric, every sequence in Q d has a converging subsequence: 

Lemma 3.9. Let {(G r ,u r )}^ =1 be a sequence of rooted graphs in Then there exists a subse- 
quence {(G ri ,u ri )}^ 1 with n+i > ri for all i, such that limj_ >00 (G ri , u r J exists. 

We next define local properties of rooted graphs. 

Definition 3.10. Let P be property of rooted graphs or a Boolean predicate on rooted graphs. We 
write (G, u) G P if (G, u) has the property, and (G, u) $ P otherwise. 

We say that P is a local property if, for every (G, u) G P there exists an r > such that if 
B r (G,u) B r {G',u'), then (G',u') G P. Let r be such that B r (G,u) B r (G',u') (G f , u') G P. 
Then we say that (G, u) has property P with radius r, and denote (G, u) G P^ . 

That is, if (G, u) has a local property P then there is some r such that knowing the ball of radius 
r around u in G is sufficient to decide that (G, u) has the property P. An alternative name for a 
local property would therefore be a locally decidable property. In our topology, local properties are 
nothing but open sets: the definition above states that if (G, u) G P then there exists an element 
of the basis of the topology that includes (G, u) and is also in P. This is a necessary and sufficient 
condition for P to be open. 

We use this fact to prove the following lemma. 

Definition 3.11. Let Bd be the set of infinite, connected, undirected graphs of degree at most d, 
and let B r d be the set of Bd-rooted graphs 

B r d = {(G,u) : G€B d ,ueG}. 

Lemma 3.12. B d is compact. 



Proof. Lemma [3H| states that Qd, the set of strongly connected rooted graphs of degree at most d, 
is compact. Since B r d is a subset of Qd, it remains to show that B r d is closed in Qd- 

The complement of B r d in Qd is the set of graphs in Qd that are either finite or directed. These are 
both local properties: if (G, u) is finite (or directed), then there exists a radius r such that examining 
B r (G,u) is enough to determine that it is finite (or directed). Hence the sets of finite graphs and 
directed graphs in Q d are open in Qd, their intersection is open in Qd, and their complement, B r d , is 
closed vnQd- □ 

We now state and prove the main lemma of this subsection. Note that the set of graphs Bd 
satisfies the conditions of this lemma. 

Lemma 3.13. Let A be a set of infinite, strongly connected graphs, let A r be the set of A-rooted 
graphs 

A r = {{G,u) : G £ A,u £ G}, 

and assume that A is such that A T is compact. 

Let P be a local property such that for each G G A there exists a vertex w G G such that 
(G, w) G P. Then for each G G A there exist an ro and infinitely many distinct vertices {w n }^ =1 
such that (G,w n ) G P (ro) for all n. 
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Proof. Let G be an arbitrary graph in A. Consider a sequence {v r } ( ^2 =1 of vertices in G such that 
for all r,s£N the balls B r (G, v r ) and B S (G, v s ) are disjoint. 

Since A r is compact, the sequence {(G, v r )}^ =1 has a converging subsequence {(G, tvjlr^i with 
rj+ 1 > n. Write u r = v n , and let 

(G',u') = lim (G,u r ). 

r— >oo 

Note that since .4 r is compact, (G, it') G >4 r and in particular G' G A is an infinite, strongly 
connected graph. Note also that since > r$, it also holds that the balls B r (G, u r ) and B S (G, u s ) 
are disjoint for all r, s G N. 

Since G' G A, there exists a vertex w' G G' such that (G', u/) G P. Since P is a local property, 
(G',k/) G P( r °) for some r , so that if B ro (G',w') ^ B ro (G,w) then (G,«j) G P. 

Let i? = d(u',w') + ro, so that B rQ (G\w') C Br(G' \u'). Then, since the sequence (G,u r ) 
converges to (G',u'), for all r > i? it holds that B R (G,u r ) = B R (G',u'). Therefore, for all r > R 
there exists a vertex w r G Br(G,u t ) such that B ro (G,w r ) = B ro (G' ,w'). Hence (G,w r ) G p( r °) for 
all r > i? (see Fig [2]). Furthermore, for r,s>R, the balls B R (G,u r ) and Br(G,u s ) are disjoint, 
and so w T ^ w s . 
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We have therefore shown that the vertices {w r } r> R are an infinite set of distinct vertices such 
that (G,w r ) G as required. 

□ 



3.3 Coupling isomorphic balls 

This section includes three claims that we will use repeatedly later. Their spirit is that everything 
that happens to an agent up to time t depends only on the state of the world and a ball of radius 
t around it. 

Recall that J- U (t), the information available to agent u at time t, is the algebra generated by W u 
and A w (t') for all w neighbors of u and t' < t. Recall that I u {t) denotes this exact set of actions: 

I u {t) = {A w {t) : w G N(u)} = {A w (t') : w G N(u),t' < t} . 

Claim 3.14. For all agents u and times t, I u {t) a deterministic function of W(Bt(G,u)). 



Recall (Definition 3.4) that W(Bt(G,u)) are the private signals of the agents in Bt(G,u), the 
ball of radius t around it (Definition 3.7). 

Proof. We prove by induction on t. / u (l) is empty, and so the claim holds for t = 1. 

Assume the claim holds up to time t. By definition, A u (t+1) is a function of W u and of I u (t+1), 
which includes {A w (t') : w G N(u),t' < t}. A w {t') is a function of W w and I w (t'), and hence by the 
inductive assumption it is a function of W(B t >(G, w)). Since t' < t + 1 and the distance between u 
and w is one, W(B t >(G, w)) C W(B t +i(G,u)), for all w G N(u) and t' < t . Hence I u {t + 1) is a 
function of W(B t +i(G,u)), the private signals in B t +i(G,u). □ 



The following lemma follows from Claim 3.14 above: 

Lemma 3.15. Consider two processes with identical private signal distributions (no,Hi), on dif- 
ferent graphs G = (V, E) and G' = {V, E'). 

Let t > 1, u G V and v! G V be such that there exists a rooted graph isomorphism h : Bt(G, u) — > 
B t (G',w'). 

Let M be a random variable that is measurable in J- U {t). Then there exists an M 1 that is 
measurable in T u '(t) such that the distribution of (M, S) is identical to the distribution of (M' , S'). 

Recall that a graph isomorphism between G = (V,E) and G' = (V',E') is a bijective function 
h : V V such that (u, v) G E iff (h(u), h(v)) G E'. 

Proof. Couple the two processes by setting S = S', and letting W w = W w < when h(w) = w' . Note 
that it follows that W u = W u >. By Claim 3.14 we have that I u (t) = I u '(t), when using h to identify 
vertices in V with vertices in V. 

Since M is measurable in T u (t), it must, by the definition of J-* u (t), be a function of I u (t) and W u . 
Denote then M = /(!„(*), W u ). Since we showed that I u (t) = I u >(t), if we let M' = f(I u '(t), W u >) 
then the distribution of (M, S) and (M' , S') will be identical. □ 

In particular, we use this lemma in the case where M is an estimator of S. Then this lemma 
implies that the probability that M = S is equal to the probability that M' = S' . 

Recall that p u (t) = W[A u (t) = S] = max^ e j7 u M P [A = S]. Hence we can apply this lemma 
(3.15) above to A u (t) and A u i(t): 

Corollary 3.16. If Bt(G,u) and Bt(G' ,u') are isomorphic then p u {t) =p u '(t). 
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3.4 ^-independence 

To prove that agents learn S we will show that the agents must, over the duration of this process, 
gain access to a large number of measurements of S that are almost independent. To formalize the 
notion of almost-independence we define (5-independence and prove some easy results about it. The 
proofs in this subsection are relatively straightforward. 

Let \i and v be two measures defined on the same space. We denote the total variation distance 
between them by d.TvifJ', 1 '). Let A and B be two random variables with joint distribution juu m. 
Then we denote by lia the marginal distribution of A, lib the marginal distribution of B, and 
HA x I^B the product distribution of the marginal distributions. 

Definition 3.17. Let (X±, X2, ■ ■ ■ , X^) be random variables. We refer to them as (5-independent 

if their joint distribution H(x 1: ...,x k ) has total variation distance of at most 5 from the product of 
their marginal distributions nx 1 x ■ ■ ■ x nx k ■' 

drv(ji{x 1 ,...jc k ),lix 1 x ••• x nx k ) < 5. 
Likewise, (X\, . . . , Xi) are ^-dependent if the distance between the distributions is more than 

<5. 

We remind the reader that a coupling u, between two random variables A\ and A2 distributed 
v\ and U2, is a distribution on the product of the spaces ^1,^2 such that the marginal of A{ is fj. 
The total variation distance between A\ and A2 is equal to the minimum, over all such couplings 
v, of v{A l ^ A 2 ). 

Hence to prove that X, Y are ^-independent it is sufficient to show that there exists a coupling 
v between v\, the joint distribution of (X, Y) and 1/2, the products of the marginal distributions of 
X and Y, such that u((X 1 ,Y 1 ) / (X 2 ,Y 2 )) < 5. 

Alternatively, to prove that (A, B) are <5-independent, one could directly bound the total vari- 
ation distance between Li(A,B) an( i I 1 A x hb by 5. This is often done below using the fact that the 
total variation distance satisfies the triangle inequality dTv{^-, v ) ^ ^ry(ju, 7) + drvil,^)- 

We state and prove some straightforward claims regarding (5-independence. 

Claim 3.18. Let A, B and C be random variables such that F [A 7^ B] < 5 and (B, C) are 6'- 
independent. Then (A, C) are 25 + 5' -independent. 

Proof. Let ll^ a b ,c) °e a joint distribution of A, B and C such that F [A 7^ B] < 8. 

Since F[A^ B] <5,F [(A, C) / (B, C)] < 5, in both cases that A, B, C are picked from either 
V(A,B,c) or fi(A,B) x He- Hence 

dTv(fJ-(A,C), H(B,C)) < ^ 

and 

drvi^A X LlC,LLB x hc) < 5. 
Since (B, C) are ^'-independent, 

drvillB X Ll C ,Ll(B,C)) < 
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The claim follows from the triangle inequality 

drv{H(A,c),HA x He) < d T v{H(A,c),H(B,c)) +dTv{H(B,c),HB x He) + drv(VB x He, Ha x hc) 
< 25 + 5'. 

□ 

Claim 3.19. Let (X,Y) be 5 -independent, and let Z = f(Y,B) for some function f and B that is 
independent of both X and Y . Then (X, Z) are also 5 -independent. 

Proof. Let H(x,Y) ^ e a i om ^ distribution of X and Y satisfying the conditions of the claim. Then 
since (X, Y) are ^-independent, 

dTv(H{x,Y),Hx x hy) < 6. 
Since B is independent of both X and Y, 

drv(H(x,Y) x Hb,Hx x hy x hb) < 5 

and (X, Y, B) are 5-independent. Therefore there exists a coupling between (X±, Y±, B\) ~ H(x,Y) x 
/i B and (X 2 , Y 2 , B 2 ) ~ Hx x hy x hb such that P [(Xi, Fi, 5i) / (X 2 , F 2 , B 2 )] < -5. Then 

p[(Xi,/(yi,Si)) / (x 2 ,/(y 2 ,B 2 ))] < d 

and the proof follows. □ 

Claim 3.20. Ze£ ^4 = (A±, . . . , A^), and X be random variables. Let (Ai, . . . , A^) be 5\ -independent 
and let (A, X) be 5 2 -independent. Then (A±, . . . ,Ak,X) are (Si + <5 2 ) -independent. 

Proof. Let H(A 1 ,...,A k ,x) be the joint distribution of A = (A\, . . . , A^) and X. Then since (Ai, . . . , A^) 
are (^-independent, 

d T v(HA,HA 1 x • • • x < <5i- 

Hence 

drv(HA x hx,HAj x ■ ■ ■ x HA k x Hx) < $i- 
Since (A, X) are ^-independent, 

dTv(H(A,x),HA x Hx) <o~2- 
The claim then follows from the triangle inequality 

dTv(H(A,x),HA! x • • • x HA k x /ix) < d T v(.H(A,x),HA x + d T v{HA x hx,Ha 1 x ■ ■ ■ x HA k x hx)- 

□ 

Lemma 3.21. For every 1/2 < p < 1 £/iere exist 5 = 5(p) > and n = n(p) > such that if S 
and (Xi,X2,Xs) are binary random variables with P [5=1] = 1/2, 1/2 < p — 77 < P = S] < 1, 
and (Xi, X2, X3) are 5 -independent conditioned on S then P [a(Xi, X2, X3) = S] > p, where a is 
the MAP estimator of S given (Xi, X 2 , X3). 
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In other words, one's odds of guessing S using three conditionally almost-independent bits are 
greater than using a single bit. 



Proof. We apply Lemma 3.22 below to three conditionally independent bits which are each equal 



to S w.p. at least p — r\. Then 

P [a(X u X 2 , X 3 ) = S]>p-n + e p „ v 

where e q = ^(2q- l)(3q 2 - 2q 3 - q). 

Since e q is continuous in q and positive for 1/2 < q < 1, it follows that for rj small enough 
p — r] + ep- v > p. Now, take 5 < e p - v — rj. Then, since we can couple ^-independent bits to 
independent bits so that they differ with probability at most 5, the claim follows. □ 

Lemma 3.22. Let S and (X\, X2, X 3 ) be binary random variables such that P [S = 1] = 1/2. Let 
1/2 < p < P [Xi = S] < 1. Let a(X 1 ,X 2 ,X 3 ) be the MAP estimator of S given (X 1 ,X 2 ,X 3 ). Then 
there exists an e p > that depends only on p such that if (X±, X 2 , X 3 ) are independent conditioned 
on S then¥[a(X 1 ,X 2 ,X 3 ) = S] > p + e p . 
In particular the statement holds with 

e P = ^(2p-l)(3p 2 -2p 3 -p). 

Proof Denote X = (X 1 ,X 2 ,X 3 ). 

Assume first that P [Xi = S] = p for all i. Let 61, 5 2 , S 3 be such that p + 5i = P [Xi = 1\S = 1] 
and p - 5i = F[Xi = 0\S = 0}. 

To show that P [a(^) = S] > p + e p it is enough to show that P [b(X) = S] > p + e p for some 
estimator b, by the definition of a MAP estimator. We separate into three cases. 

1. If Si = 6 2 = S 3 = then the events Xi = S are independent and the majority of the X^s 
is equal to S with probability p' = p 3 + 3p 2 (l — p), which is greater than p for \ < p < 1. 
Denote r\ p = p' — p. Then P [a(X) = S] > p + rj p . 

2. Otherwise if \5i\ < r/ p /6 for all i then we can couple X to three bits Y = (Yi,Y 2 ,Y 3 ) which 
satisfy the conditions of case 1 above, and so that P [X ^ Y] < i] p /2. Then P [a(X) = S] > 
P + Vp/ 2 - 

3. Otherwise we claim that there exist i and j such that \5i + 5A > ij p /12. 

Indeed assume w.l.o.g. that 5\ > r/ p /6. Then if it doesn't hold that 5\ + 5 2 > r/ p /l2 and 
it doesn't hold that 5i + 5 3 > r/ p /l2 then 5 2 < —r\ p j\2 and £3 < —r] p /12 and therefore 
S 2 + 6 3 <- Vp /12. 

Now that this claim is proved, assume w.l.o.g. that 5i + 5 2 > r] p /12. Recall that Xi G {0, 1}, 
and so the product X\X 2 is also an element of {0, 1}. Then 

P [X X X 2 = S] = ip [X X X 2 = l\S = 1] + ip [X X X 2 = Q\S = 0] 

= 5 ((P + + 5 2 ) + (p - 8i)(p - S 2 ) + (p-5 1 )(l-p + 5 2 ) + (l-p + 5x)(p - 
= p+\(2p-l){5 1 + 5 2 ) 
> p+ (2p - 1)^/12, 

and so P [a(X) = S] > p + (2p - 1)t? p /12. 
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Finally, we need to consider the case that P [Xi = S] = p, > p for some i. We again consider 
two cases. Denote e p = (2p — 1)%>/100. If there exists an i such that pi > e p then this bit is by 
itself an estimator that equals S with probability at least p + e p , and therefore the MAP estimator 
equals S with probability at least p + e p . 

Otherwise p < pi < pi + e p for all i. We will construct a coupling between the distributions 
of X = (Xi, X2, X3) and Y = (Y\,Y2,Y^) such that the Y^s are conditionally independent given 
S and P [Yi = S] = p for all i, and furthermore P [Y ^ X] < 3e p . By what we've proved so far 
the MAP estimator of S given Y equals S with probability at least p + (2p — l)r/ p /12 > p + 8e p . 
Hence by the coupling, the same estimator applied to X is equal to S with probability at least 
p + 8e p - 3e p > p + e p . 

To couple X and Y let Zj be a real i.i.d. random variables uniform on [0,1]. When S = 1 let 
Xj = Yi = S if Zj > pi + <5j, let Aj = 5 and = 1 — S if Z, G [p + <5j,pj + <y , and otherwise 
Xj = Yi = 1 — S 1 . The construction for S = is similar. It is clear that A and Y have the required 
distribution, and that furthermore P [Xj Yj] = pi — p < e p . Hence P [I / F] < 3e p , as needed. 

□ 



3.5 Asymptotic learning 

In this section we prove Theorem [2] 

Theorem ([2]). Let fj,Q, [i\ be such that for every connected, undirected graph G there exists a random 
variable A such that almost surely A u = A for all u G V . Then there exists a sequence q(n) = 
q(n, /xo, Hi) such that q(n) — > 1 as n — >■ oo, and P [A = {S}] > q{n), for any choice of undirected, 
connected graph G with n agents. 

To prove this theorem we will need a number of intermediate results, which are given over the 
next few subsections. 



3.5.1 Estimating the limiting optimal action set A 



We would like to show that although the agents have a common optimal action set A only at the 
limit t — > 00, they can estimate this set well at a large enough time t. 

The action A u (t) is agent it's MAP estimator of S at time t (see Remark 2.3). We likewise 
define K u (t) to be agent it's MAP estimator of A, at time t: 

K u (t)= argmax P [A = K\ F u (t)\ . (1) 
A'e{{o},{i},{o,i}} 

We show that the sequence of random variables K u (t) converges to A for every it, or that alterna- 
tively K u {t) = A for each agent it and t large enough: 

Lemma 3.23. P [lim^oo K u (t) = A] = 1 for all ueV. 



This lemma (3.23) follows by direct application of the more general Lemma 3.24 which we prove 
below. Note that a consequence is that linit_ 5 . 00 P [K u (t) = A] = 1. 

Lemma 3.24. Let K\ C JC2, ■ ■ ■ be a filtration of a -algebras, and let /Coo = Ut/Ct. Let K be 
a random variable that takes a finite number of values and is measurable in /Coo. Let M(t) = 
argmax/jP [K = k\lC(t)] be the MAP estimator of K given K,f Then 



lim M(t) 
t— yoo 



K 



1. 
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Proof. For each k in the support of K, P [K = k\fCt] is a bounded martingale which converges almost 
surely to P [K = fc|/Coo], which is equal to 1 {K = k), since K is measurable in Goo- Therefore 
M{t) = argmax fc P [K = k\JCt] converges almost surely to argmax fc P [K = k\JCoo} = K. □ 

We would like at this point to provide the reader with some more intuition on A u (t), K u (t) and 
the difference between them. Assuming that A = {1} then by definition, from some time to ° n ) 



A u (t) = 1, and from Lemma 3.23, K u (t) = {1}. The same applies when A = {0}. However, when 
A = {0, 1} then A u {t) may take both values and 1 infinitely often, but K u (t) will eventually equal 
{0, 1}. That is, agent u will realize at some point that, although it thinks at the moment that 1 is 
preferable to (for example), it is in fact the most likely outcome that its belief will converge to 
1/2. In this case, although it is not optimal, a uniformly random guess of which is the best action 
may not be so bad. Our next definition is based on this observation. 
Based on K u (t), we define a second "action" C u (t). 

Definition 3.25. Let C u (t) be picked uniformly from K u {t): if K u (t) = {1} then C u (t) = 1, if 
K u {t) = {0} then C u {t) = 0, and if K u (t) = {0, 1} then C u {t) is picked independently from the 
uniform distribution over {0, 1}. 

Note that we here extend our probability space by including in I u (t) (the observations of agent 
u up to time t) an extra uniform bit that is independent of all else and S in particular. Hence this 
does not increase u's ability to estimate S, and if we can show that in this setting u learns S then 
u can also learn S without this bit. In fact, we show that asymptotically it is as good an estimate 
for S as the best estimate A u {t): 

Claim 3.26. lim^oo P [C u (t) = S] = lim^oo P [A u (t) = S) = p for all u. 

Proof. We prove the claim by showing that it holds both when conditioning on the event A = {0, 1} 
and when conditioning on its complement. 



When A / {0, 1} then for t large enough A = {A u (t)}. Since (by Lemma 3.23) \\mK u {t) = A 
with probability 1, in this case C u {t) = A u {t) for t large enough, and 

lim P [C u {t) = S\A £ {0, 1}] = P [A = {S}\A + {0, 1}] = lim P [A u (t) = S\A ± {0, 1}] . 

t— >oa t— »oo 

When A = {0, 1} then limX u (i) = limPL4 u (i) = S\T u (t)] = 1/2 and so limPL4 u (i) = S] = 
1/2. This is again also true for C u (t), since in this case it is picked at random for t large enough, 
and so 

lim F[C u (t) = S\A = {0,1}] = \ = lim P [A u (t) = S\A = {0, 1}] . 



t— >oo ' ' 2 t— »oo 



3.5.2 The probability of getting it right 



□ 



Recall Definition 3.3 p u (t) = F[A u (t) = S] and p u = lim^oo p u {t) (i.e., p u (t) is the probability 
that agent u takes the right action at time t). We prove here a few easy related claims that will 
later be useful to us. 

Claim 3.27. p u (t + 1) > p u (t). 
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Proof. Condition on J- U (t + 1), the information available to agent u at time t + 1. Hence the 
probability that A u {t + 1) = S is at least as high as the probability A u (t) = S, since 

A u (t + 1) = argmaxP [S = s\F(t + 1)] 

s 

and A u (t) is measurable in F(t + 1). The claim is proved by integrating over all possible values of 
F u (t + 1). □ 



Since p u (t) is bounded by one, Claim 3.27 means that the limit p u exists. We show that this 
value is the same for all vertices. 

Claim 3.28. There exists a p € [0, 1] such that p u = p for all u. 

Proof. Let u and w be neighbors. As in the proof above, we can argue that P [A u (t + 1) = S\ J- U {t + 1)] > 
P [A w (t) = S\F u {t + 1)], since A w (t) is measurable in F u {t + 1). Hence the same holds uncondi- 
tioned, and so we have that p u > p w , by taking the limit t — >• oo. Since the same argument can 
be used with the roles of u and w reversed, we have that p u = p w , and the claim follows from the 
connectedness of the graph, by induction. □ 

We make the following definition in the spirit of these claims: 

Definition 3.29. p = lim^oo P [A u (t) = S]. 

In the context of a specific social network graph G we may denote this quantity as p{G). 
For time t = 1 the next standard claim follows from the fact that the agents' signals are 
informative. 

Claim 3.30. p u (t) > 1/2 for all u and t. 
Proof. Note that 

P [A u (l) = S\W U ] = max{X u (l), 1 - X u (l)} = max{P [S = 0\W u ],P[S = 1\W U }}. 

Recall that p u {\) = P [A U {1) = S}. Hence 

Pu(l)=EpP [A U {1) = S\W U \] 

= E[max{P [S = 0\W u ],¥[S = 1\W U }}} 



Since max{a, 6} = \{a + b) + \\a - b\, and since P [S = 0\W U ] + P [S = 1\W U ] = 1, it follows that 

p u (l) = \ + |E [|P [S = 0\W U ] -F[S = 1\W U ] \] 

= \ + l D Tv{P-0, Hi), 

where the last equality follows by Bayes' rule. Since /Uq ^ fj,x, the total variation distance 



Dtv (/^O; Hi) > and p u (l) > \. For t > 1 the claim follows from Claim 



3.27 



above. □ 



Recall that |iV(tt)| is the out-degree of u, or the number of neighbors that u observes. The next 
lemma states that an agent with many neighbors will have a good estimate of S already at the 
second round, after observing the first action of its neighbors. This lemma is adapted from Mossel 
and Tamuz [13j, and provided here for completeness. 
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Lemma 3.31. There exist constants C\ = Ci(/Xq, and C% = C2(no, (J-i) such that for any agent 
u it holds that 



P«(2) > 1 -Cie- 02 -^. 

Proof. Conditioned on S, private signals are independent and identically distributed. Since A w (l) is 
a deterministic function of W w , the initial actions A w (l) are also identically distributed, conditioned 



on S. Hence there exists a q such that p w (l) = P [-A^(i) = 5] = 5 for all agents u>. By Lemma 3.30 
above, q > 1/2. Therefore 

P K(l) = 1|5 = 1] / P [A w (l) = 1\S = 0] , 

and the distribution of A w (l) is different when conditioned on S = or S = 1. 

Fix an agent u, and let n = \N(u)\ be the out-degree of u, or the number of neighbors that it 
observes. Let {uui, . . . , wwr u \\} be the set of n's neighbors. Recall that A u (2) is the MAP estimator 
of S given (A Wl (l), . . . ,A Wn (l)), and given u's private signal. 

By standard asymptotic statistics of hypothesis testing (cf. [7j), testing an hypothesis (in our 
case, say, S = 1 vs. S = 0) given n informative, conditionally i.i.d. signals, succeeds except with 
probability that is exponentially low in n. It follows that P \A U (2) 7^ S] is exponentially small in n, 
so that there exist C\ and C2 such that 

Pu (2) = P K(2) = S] > 1 - Cie-^M 

□ 

The following claim is a direct consequence of the previous lemmas of this section. 

Claim 3.32. Let d{G) = sup u {A r (u)} be the out-degree of the graph G; note that for infinite graphs 
it may be that d = 00. Then there exist constants C\ = Ci(//o,Mi) an d C2 = C2(/Uo,/ii) such that 

p{G) > 1 - Cie- c ^ d{G) 

for all agents u. 



Proof. Let u be an arbitrary vertex in G. Then by Lemma 3.31 it holds that 

p u (2)>l-C ie - c ^ u \ 



for some constants C\ and C2. By Lemma 3.27 we have that p u (t + 1) > p u (t), and therefore 

p u = lim > 1-C ie - C2 - N ^. 



Finally, p{G) = p u by Lemma 3.28, and so 

p u >i-c ie - c *- N M. 

Since this holds for an arbitrary vertex u, the claim follows. □ 
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3.5.3 Local limits and pessimal graphs 



We now turn to apply local limits to our process. We consider here and henceforth the same model 
of Definitions 2.1 and 2.2 as applied, with the same private signals, to different graphs. We write 
p(G) for the value of p on the process on G, A(G) for the value of A on G, etc. 



Lemma 3.33. Let (G,u) = lim r _>. 00 (G ! r , u r )- Then p(G) < lim inf r p(G r 



Proof. Since B r (G r ,u r ) = B r (G,u), by Lemma 3.16 we have that p u { r ) = Pu r { r )- By Claim 3.27 
Pu r ( r ) — p(G r ), and therefore p u (r) < p(G r ). The claim follows by taking the liminf of both 
sides. □ 

A particularly interesting case in the one the different G r 's are all the same graph: 

Corollary 3.34. Let G be a (perhaps infinite) graph, and let {u r } be a sequence of vertices. Then 
if the local limit (H, u) = lim r _ 5 . 00 (G, u r ) exists then p(H) < p(G). 

Recall that Bd denotes the set of infinite, connected, undirected graphs of degree at most d. Let 

B = \jB d . 



Definition 3.35. Let 

P* =P*{vo,Vi) -■ 

be the probability of learning in the pessimal graph. 



inf p(G) 



Note that by Claim 3.30 we have that p* > 1/2. We show that this infimum is in fact attained 
by some graph: 

Lemma 3.36. There exists a graph H G B such that p(H) = p* . 

Proof. Let {G r = (V r ,E r )} r ^L 1 be a series of graphs in B such that lim r ^. 00 p(G r ) = p* . Note 
that {Gr} must all be in Bd for some d (i.e., have uniformly bounded degrees), since otherwise the 
sequence p(G r ) would have values arbitrarily close to 1 and its limit could not be p* (unless indeed 
p* = 1, in which case our main Theorem [2] is proved). This follows from Lemma 3.31 



We now arbitrarily mark a vertex u r in each graph, so that u r £ V r , and let (H, u) be the limit 
of some subsequence of {G r ,ii r }£ 
exist, and H £ Bd. 



I oo 
lr=l- 



Since Bd is compact (Lemma 3.12), (H,u) is guaranteed to 



By Lemma 3.33 we have that p{H) < liminf r p(G r 
less than p* , and the claim is proved. 



p* . But since H E B, p(H) cannot be 

□ 



3.5.4 Independent bits 

We now show that on infinite graphs, the private signals in the neighborhood of agents that are 
"far enough away" are (conditioned on S) almost independent of A (the final consensus estimate 
of S). 

Lemma 3.37. Let G be an infinite graph. Fix a vertex uq in G. Then for every 5 > there exists 
an rs such that for every r > r$ and every vertex u with d(uo,u) > 2r it holds that W(B r (G,u)), 
the private signals in B r {G,u), are 5 -independent of A, conditioned on S. 
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Here we denote graph distance by d(-, •). 



Proof. Fix uo, and let u be such that d(uo, u) > 2r. Then B r (G, uq) and B r (G, u) are disjoint, and 
hence independent conditioned on S. Hence K UQ (r) is independent of W(B r (G, u)), conditioned on 
S. 

Lemma 3.23 states that P [lim^oo K uo (r) = A] = 1, and so there exists an r<s such that for 
every r > r s it h olds t hat P [K UQ (r) = A] > 1 — »5. 



Recall Claim 



3.18 



for any A, B,C, if F[A = B] = 1 — ^5 and i? is independent of C, then 
(A,C) are 5-independent. 

Applying Claim 3.18 to A, K uo (r) and W(B r (G,u)) we get that for any r greater than r$ it 
holds that W(B r (G,u)) is <5-independent of A, conditioned on S. □ 

We will now show, in the lemmas below, that in infinite graphs each agent has access to any 
number of "good estimators" : <5-independent measurements of S that are each almost as likely to 
equal S as p* , the minimal probability of estimating S on any infinite graph. 

Definition 3.38. We say that agent «eG has k (S, e)-good estimators if there exists a time t 
and estimators M%, . . . , M}~ such that (Mi, . . . , M^) G J- U (t) and 

1. P [Mi = S] > p* — e for 1 < i < k. 

2. (M\, . . . , Mfc) are 5 -independent, conditioned on S. 

Claim 3.39. Let P denote the property of having k (5,e)-good estimators. Then P is a local 
property (Definition 3.10^ of the rooted graph (G,u). Furthermore, if u E G has k (5,e)-good 
estimators measurable in J- U (t) then (G,u) G P™, i.e., (G,u) has property P with radius t. 

Proof. If (G,u) G P then by definition there exists a time t such that (Mi, . . . ,Mfe) G J~ u (t). 
Hence by Lemma 3.15, if B t (G,u) = B t (G',u') then u' G G' also has k (5, e)-good estimators 
(M[,...,M' k ) G T u '(t) and (G',u') G P. In particular, (G,u) G P (t) , i.e., (G,u) has property P 
with radius t. □ 

We are now ready to prove the main lemma of this subsection: 

Lemma 3.40. For every d > 2, G G Bd, e,5 > and k > there exists a vertex u, such that u has 
k (5,e)-good estimators. 

Informally, this lemma states that if G is an infinite graph with bounded degrees, then there 
exists an agent that eventually has k almost-independent estimates of S with quality close to p* , 
the minimal probability of learning. 

Proof. In this proof we use the term "independent" to mean "independent conditioned on S" . 

We choose an arbitrary d and prove by induction on k. The basis k = is trivial. Assume the 
claim holds for k, any G G Bd and all e,S > 0. We shall show that it holds for k + 1, any G G Bd 
and any 5, e > 0. 

By the inductive hypothesis for every G G Bd there exists a vertex in G that has k ( (5/100, e)- 
good estimators (Mi, . . . , 

Now, having k (5/100, e)-good estimators is a local property (Claim 3.39). We now therefore 
apply Lemma 3.13 since every graph G G Bd has a vertex with k (5/100, e)-good estimators, any 
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graph G £ Bd has a time i& for which infinitely many distinct vertices {w r } have k (5/100, e)-good 
estimators measurable at time if.. 

In particular, if we fix an arbitrary uq € G then for every r there exists a vertex w £ G that 
has k ((5/100, e)-good estimators and whose distance d(uQ,w) from no is larger than r. 

We shall prove the lemma by showing that for a vertex w that is far enough from uq which has 
(5/100, e)-good estimators (Ml, . . . , M&), it holds that for a time tk+i large enough (Mi, . . . , M&, C w (tk 
are (<5, e)-good estimators. 



By Lemma 3.37 there exists an r$ such that if r > and d(uo,w) > 2r then PF(B r (G, to)) 
is 5/100-independent of A Let r* = max{r,5, i^.}, where tj- is such that there are infinitely many 
vertices in G with k good estimators measurable at time 

Let u; be a vertex with k ( (5/100, e)-good estimators (Mi, . . . , Mjt) at time i^, such that d(uo, w) > 
2r* . Denote 

M=(Mi,...,M fc ). 

Since d(uo,w) > 2r$, W(B r *(G,w)) is <5/100-independent of A, and since B tk (G,w) C B r *(G,w), 
W(B tk (G,w)) is (5/100-independent of ^4. Finally, since M G F w {tk), M is a function of W(B tk (G, w)), 



and so by Claim 3.19 we have that M is also 5/100-independent of A. 



For tk+i large enough it holds that 

• K w (tk + i) is equal to L with probability at least 1 — (5/100, since 

lim F[K w (t) = A] = 1, 

t— >oo 

by Claim |3~23l 

• Additionally, P [C w (tk+i) = S] > p* - e, since 

limP[C w (t) = 5]=p>p*, 

t— »oo 



by Claim 3.26 



We have then that (M, A) are (5/100-independent and F[K w (t k+1 ) ^ A] < 5/100. Claim |3.18 
states that if (A, B) are 5-independent P [S ^ C] < then (A, C) are 5+25'-independent. Applying 
this here we get that (M , K w (tk+i)) are <5/25-independent. 



It follows by application of Claim 3.20 that (Mi, . . . , M&, K w (tk+i)) are 5-independent. Since 



C w (tk+i) is a function of K w {tk+\) and an independent bit, it follows by another application of 



Claim 3.19 that (Mi, . . . , Mt, C w (tk+i)) are also 5- independent. 

Finally, since F [C w (tk + i) = S] > p*—e, w has the k+1 (5, e)-good estimators (Mi, . . . , C w (tk + i)) 
and the proof is concluded. 

□ 



3.5.5 Asymptotic learning 

As a tool in the analysis of finite graphs, we would like to prove that in infinite graphs the agents 
learn the correct state of the world almost surely. 

Theorem 3.41. Let G = (V,E) be an infinite, connected undirected graph with bounded degrees 
(i.e., G is a general graph in B). Then p(G) = 1. 
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Note that an alternative phrasing of this theorem is that p* = 1. 



Proof. Assume the contrary, i.e. p* < 1. Let H be an infinite, connected graph with bounded 
degrees such that p(H) = p* , such as we've shown exists in Lemma 3.36 



By Lemma 3.40| there exists for arbitrarily small e, 5 > a vertex w £ H that has access at 
some time T to three <5-independent estimators (conditioned on S), each of which is equal to S 
with probability at least p* — e. By Claims 3.21| and |3.30[ the MAP estimator of S using these 
estimators equals S with probability higher than p* , for the appropriate choice of low enough e, 5. 
Therefore, since u>'s action A W (T) is the MAP estimator of S, its probability of equaling S is 
P [^^(T) = S] > p* as well, and so p(H) > p* - contradiction. □ 



Using Theorem 3.41 we prove Theorem [2j which is the corresponding theorem for finite graphs: 



Theorem ([2]). Let /io, Hi be such that for every connected, undirected graph G there exists a random 
variable A such that almost surely A u = A for all u £ V . Then there exists a sequence q(n) = 
q(n, /xo, /Mi) such that q(n) — > 1 as n — >■ oo, and P [A = {S}] > q{n), for any choice of undirected, 
connected graph G with n agents. 

Proof. Assume the contrary. Then there exists a series of graphs {G r } with r agents such that 
mm._i.oo P [A(G r ) = {S}] < 1, and so also lim^oo p(G r ) < 1. 



By the same argument of Theorem 3.41 these graphs must all be in Bd for some d, since 



otherwise, by Lemma 3.32, there would exist a subsequence of graphs {G rd } with degree at least d 
and lim^oo p{G Td ) = 1. Since Bd is compact (Lemma 3.12), there exists a graph (G,u) G B^ that 
is the limit of a subsequence of {(G r , u r )}^ =1 . 

Since G is infinite and of bounded degree, it follows by Theorem 3.41 that p{G) = 1, and in 

,Pu{r) 



particular lim^oo p u {r) = 1. 
p(G r ) > p Ur (r), Hindoo p(G r 



As before, p Ur (r) = p u {f), and therefore lim^oo p Ur (r) 
= 1, which is a contradiction. 



1. Since 
□ 



3.6 Convergence to identical optimal action sets 

In this section we prove Theorem [T] 

Theorem 0. Let (^o 5 A_) induce non-atomic beliefs. Then there exists a random variable A such 
that almost surely A u = A for all u. 

In this section we shall assume henceforth that the distribution of initial private beliefs is non- 
atomic. 



3.6.1 Previous work 

The following theorem is due to Gale and Kariv [9j. Given two agents u and w, let E® denote the 
event that A u (t) equals infinitely often and the event that A w (t) equals 1 infinitely often. 

Theorem 3.42 (Gale and Kariv). If agent u observes agent w's actions then 

F[E° u ,El] = F[X U = 1/2, E° u ,Ei]. 

I.e., if agent u takes action infinitely often, agent w takes action 1 infinitely, and u observes 
w then n's belief is 1/2 at the limit, almost surely. 
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Corollary 3.43. If agent u observes agent w 's actions, and w takes both actions infinitely often 
then X u = l/2. 

Proof. Assume by contradiction that X u < 1/2. Then u takes action infinitely often. Therefore 



Theorem 3.42 implies that X u = 1/2 - contradiction. 



The case where X u > 1/2 is treated similarly. □ 



3.6.2 Limit log-likelihood ratios 

Denote 



Y u {t) = log 



[l u (t)\S = l,A u {t) 



[l u (t)\S = 0,A u (t)Y 

In the next claim we show that Z u (t), the log- likelihood ratio inspired by it's observations up to 
time t, can be written as the sum of two terms: Z u (l) = j^(W u ), which is the log-likelihood 
ratio inspired by u's private signal W u , and Y u (t), which depends only on the actions of u and its 
neighbors, and does not depend directly on W u . 

Claim 3.44. 

Z u (t) = Z u (l) + Y u (t). 

Proof. By definition we have that 



[S = o|j;(t)] °P[s = o|/„(t),wy 

and by the law of conditional probabilities 

F[I u (t)\S = l,W u }F[W u \S = l] 

Z u (t) = log : 



F[I u (t)\S = 0,W u ]F[W u \S = 0] 
F[I u (t)\S = l,W u ] 
- lOg F[I u (t)\S = 0,W u ] +Zu{1) - 
Now I u (t), the actions of the neighbors of u up to time t, are a deterministic function of W(Bt(G, u)), 



the private signals in the ball of radius t around u, by Claim 3.14 Conditioned on S these are all 
independent, and so, from the definition of actions, these actions depend on n's private signal W u 
only in as much as it affects the actions of u. Hence 

F[I u (t)\S = s,W u ]=F[l u (t)\S = s,A u (t)] , 

and therefore 

F[l u (t)\S=l,A u (t)] 

Z u (t) = log — p — — -. - ri + Z u \\) 

uy> & F[l u (t)\S = 0,A u (t)] uK ' 

= Z U (1) + Y u (t). 

□ 

Note that Y u (t) is a deterministic function of I u {t) and A u (t). 

Following our notation convention, we define Y u = Mrat-^oo Y u (t). Note that this limit exists 
almost surely since the limit of Z u (t) exists almost surely. The following claim follows directly from 
the definitions: 

Claim 3.45. Y u is measurable in (A U ,I U ), the actions of u and its neighbors. 
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3.6.3 Convergence of actions 

The event that an agent takes both actions infinitely often is (almost surely) a sufficient condition 
for convergence to belief 1/2. This follows from the fact that these actions imply that its belief 
takes values both above and below 1/2 infinitely many times. We show that it is also (almost 
surely) a necessary condition. Denote by E® the event that u takes action a infinitely often. 

Theorem 3.46. 

P [E° u n El X u = 1/2] = P [X u = 1/2] . 

I.e., it a.s. holds that X u = 1/2 iff u takes both actions infinitely often. 

Proof. We'll prove the claim by showing that P [-"(-EjJ Pi El),X u = 1/2] = 0, or equivalently that 
P [-.(.Eg n El), Z u = 0] =0 (recall that Z u = log X M /(1 - X u ) and so X u = 1/2 O Z u = 0). 

Let a = (a(l), a(2), . . .) be a sequence of actions, and denote by W- u the private signals of all 
agents except u. Conditioning on W- u and S we can write: 

P [A u = a, Z u = 0] = E [P [A u = a,Z u = 0| W_„, S]] 

= E [P [A u = a, Z u (l) = -Y U \W. U , S]] 

where the second equality follows from Claim 3.44 Note that by Claim 3.45| 1^ is fully determined 
by A u and W- u . We can therefore write 

P [A u = a, Z u = 0] = E [P [A u = a, Z„(l) = -Y U (W. U , a)|W_„, S]] 
< E [P [Z u (l) = -Y U (W- U , a)\W- u , S}} 

Now, conditioned on S, the private signal W u is distributed ^5 and is independent of W- u . 
Hence its distribution when further conditioned on W- u is still fj,s- Since Z u {\) = log ^(W 7 ^), its 
distribution is also unaffected, and in particular is still non-atomic. It therefore equals —Y U (W- U , a) 
with probability zero, and so 

P [A u = a, Z u = 0] = 0. 

Since this holds for all sequences of actions a, it holds in particular for all sequences which converge. 
Since there are only countably many such sequences, the probability that the action converges (i.e., 
-^{El n El)) and Z u = is zero, or 

F[^(E° u nE 1 u ),Z u = 0]=0. 

□ 

Hence it impossible for an agent's belief to converge to 1/2 and for the agent to only take one 
action infinitely often. A direct consequence of this, together with Thm. 3.42 is the following 
corollary: 

Corollary 3.47. The union of the following three events occurs with probability one: 

1. Vu G V : lirrn.-_-.oo A u (t) = S. Equivalently, all agents converge to the correct action. 

2. V u € V : lim^oo A u (t) = 1 — S. Equivalently, all agents converge to the wrong action. 
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3. \/u € V : X u = 1/2, and in this case all agents take both actions infinitely often and hence 
don't converge at all. 

Proof. Consider first the case that there exists a vertex u such that u takes both actions infinitely 
often. Let w be a vertex that observes u. Then by Corollary 3.43 we have that X w = 1/2, and 



by Theorem 3.46 w also takes both actions infinitely often. Continuing by induction and using the 
fact that the graph is strongly connected we obtain the third case that none of the agents converge 
and X u = 1/2 for all u. 

It remains to consider the case that all agents' actions converge to either or 1. Using strong 
connectivity, to prove the theorem it suffices to show that it cannot be the case that w observes u 
and they converge to different actions. In this case, by Corollary 3.43 we have that X w = 1/2, and 
then by Theorem 3.46 agent w's actions do not converge - contradiction. □ 



Theorem [T] is an easy consequence of this theorem. Recall that A u = {1} when X u > 1/2, 
A u = {0} when X u < 1/2 and A u = {0, 1} when X u = 1/2. 

Theorem Q. Let (/Uoj/^i) induce non-atomic beliefs. Then there exists a random variable A such 
that almost surely A u = A for all u. 



Proof. Fix an agent v. When X v < 1/2 (resp. X v > 1/2) then the first (resp. second) case of 
corollary 3.47 occurs and A = {0} (resp. A 
occurs, X u = 1/2 for all u £ V and A u 



{1}). Likewise when X v 
{0, 1} for all u 6 V. 



1/2 then the third case 
□ 



3.7 Extension to L-locally connected graphs 

The main result of this article, Theorem [2j is a statement about undirected graphs. We can extend 
the proof to a larger family of graphs, namely, L-locally connected graphs. 

Definition 3.48. Let G = (V, E) be a directed graph. G is L-locally strongly connected if, for each 
(u, w) E E, there exists a path in G of length at most L from w to u. 

Theorem [2] can be extended as follows. 

Theorem 3.49. Fix L, a positive integer. Let Ho,Hi be such that for every strongly connected, 
directed graph G there exists a random variable A such that almost surely A u = A for all u € V . 
Then there exists a sequence q(n) = q(n, fio, fii) such that q(n) — > 1 as n — > oo, and F [A = {S}] > 
q{n), for any choice of L-locally strongly connected graph G with n agents. 



The proof of Theorem |3.49 is essentially identical to the proof of Theorem [2] The latter is 
a consequence of Theorem 3.41 , which shows learning in bounded degree infinite graphs, and of 



Lemma 3.32 which implies asymptotic learning for sequences of graphs with diverging maximal 
degree. 

Note first that the set of L-locally strongly connected rooted graphs with degrees bounded by d 



is compact. Hence the proof of Theorem 3.41| can be used as is in the L-locally strongly connected 
setup. 



In order to apply Lemma 3.32 in this setup, we need to show that when in-degrees diverge then 
so do out-degrees. For this note that if (u, v) is a directed edge then u is in the (directed) ball of 
radius L around v. Hence, if there exists a vertex v with in-degree D then in the ball of radius L 
around it there are at least D vertices. On the other hand, if the out-degree is bounded by d, then 
the number of vertices in this ball is at most L • d L . Therefore, d — > oo as D — > oo. 
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A Example of Non-atomic private beliefs leading to non-learning 



We sketch an example in which private beliefs are atomic and asymptotic learning does not occur. 

Example A.l. Let the graph G be the undirected chain of length n, so that V = {1, . . . , n} and 

(u,v) is an edge if \u — v\ = 1. Let the private signals be bits that are each independently equal 
to S with probability 2/3. We choose here the tie breaking rule under which agents defer to their 
original signal.^ 

We leave the following claim as an exercise to the reader. 

Claim A. 2. If an agent u has at least one neighbor with the same private signal (i.e., W u = W v 
for v a neighbor of u) then u will always take the same action A u (t) = W u . 

Since this happens with probability that is independent of n, with probability bounded away 
from zero an agent will always take the wrong action, and so asymptotic learning does not occur. It 
is also clear that optimal action sets do not become common knowledge, and these fact are indeed 
related. 
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